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Modern software systems, like GNU/Linux distributions or Eclipse-based development environment, 
are often deployed by selecting components out of large component repositories. Maintaining such 
software systems by performing component upgrades is a complex task, and the users need to have an 
expressive preferences language at their disposal to specify the kind of upgrades they are interested 
in. Recent research has shown that it is possible to develop solvers that handle preferences expressed 
as a combination of a few basic criteria used in the MISC competition, ranging from the number 
of new components to the freshness of the final configuration. In this work we introduce a set of 
new criteria that allow the users to specify their preferences for solutions with components aligned to 
the same upstream sources, provide an efficient encoding and report on the experimental results that 
prove that optimising these alignment criteria is a tractable problem in practice. 

1 Introduction 

Recent research, in part fostered by the Mancoosi projecj^ has focused on the complex problem of 
handling upgrades in component based software systems, with a particulai" attention to the case of 
GNU/Linux distributions, which contain several tens of thousands of components. Installing compo- 
nents (called packages in the world of distributions) may be complex: each component may need some 
extra components to be installed, as described in its metadata by dependencies, and may be incompatible 
with some other ones, as described in its metadata by conflicts. Indeed, determining whether a compo- 
nent can be installed is NP-complete [7], but problem instances arising in practice turn out to be tractable 
by modern solvers JTKTTJO. These practical results opened the way to explore not just the question of 
finding a way of installing some components, but the best way of doing so, according to some criteria 
that capture the user preferences and needs. 

The Mancoosi International Solver Competition (MlSCj^ was established with the goal to distill 
interesting problems from real-world GNU/Linux distribution upgrade scenarii, and present them to the 
solver research community. The problems are encoded in documents written using a common format, 
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Table 1 : Optimization criteria 



removed{I,S) ={naine \ Vp{I,name) 7^ and Vp{S,name) = 0} 
new{I,S) ={name \ Vp{I,name) = and Vp{S,name) 7^ 0} 

changed{I,S) ={name \ Vp{I,name) / Vp{S^name)} 

notuptodate{I,S)={name \ Vp{S,name) 7^ and does not contain the most recent version of name in S} 
unsatrec{I ,S) ={{name,v,c) — v is an element of Vp{S,name) and {name,v) recommends ...,c, ... 
and c is not satisfied by S} 



CUDF 191, that describe the universe of available components with their interdependencies and the user 
request; the solvers are requested to find solutions that are ranked according to user preferences which 
are currently built by composing a few basic criteria using aggregation functions like the lexicographic 
ordering. 

There are five basic criteria currently used in MISC: removed, changed, new, notuptodate and un- 
satrecommends, which capture intuitive properties of a solution to an upgrade problems, like the number 
of removed components or the number of components that are not the most up to date. They are sum- 
marised in Table [T] where / is the initial installation and 5 is a proposed new installation. We write 
Vp{X,name) for the set of versions in which name (the name of a component) is installed in X, where 
X may be / or S. That set may be empty (name is not installed), contain one element (name is installed 
in exactly that version), or even contain multiple elements in case a component is installed in multiple 
versions. These criteria and aggregation functions are an important starting point for this research, but 
are not sufficient to capture all the important properties of an upgrade : identifying new basic criteria 
and new aggregation function is an important activity, that will help improve the algorithms and tools 
available for maintaining the complex software systems of tomorrow. 

Contribution In the context of complex software systems, we can expect that configurations containing 
synchronized components be more robust, for multiple reasons: synchronised components have been in 
general developed together, and more thoroughly tested. Component metadata may contain, or can be 
enriched with, information about synchronization (for example, via a Source version field), that can be 
exploited to search for synchronised configuration. 

In this paper, we present a new criterion, component alignment, which measures the synchronization 
of closely related components in an installation and which is not expressible using the existing criteria 
used in the MISC competition. Then, we show how to encode it using current solver technology, and 
present experimental results that show that it is tractable in practice. 



2 Component alignment 

In complex software systems, like GNU/Linux distributions, components do not exist in isolation, but 
are very often related to each other, even if they may be installed independently: the documentation of a 
program, for example, is not necessary to run it, but they are both present, the user expects them to be of 
the same version, or, in other terms, to be aligned. 

With the current basic preferences used in MISC, it is not possible to express this alignment prop- 
erty, and one can see that even the best solutions in the MISC 2010 competition may contain strange 
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component combinations : for example, the solutions found by the competition entrants for the problem 
eeee44c^in the trendy track, contains surprising combinations like 

Package version 

aptitude-doc-fr 0.6.1.5-3 

aptitude 0.4.11.11-l+b2 

or even 



Package version 
linux-libc-dev 2.6.32-9 
linux-source-2.6.30 2.6.30-8squeezel 

These are potential sources of confusion for a user that finds documentation way ahead of the installed 
binaries or sources way behind the installed libraries. In the case of mixed versions of important libraries, 
like gs in version 8.64 df sg-l+squeezel which is used with gs-common version 8.71 dfsg-4 in 
the same example, one can even experience real incompatibilities, due to combination of components 
that have not been thoroughly tested together. 



Obtaining an aligned installation by hand is quite painful, because of the number of involved pack- 
ages: it is really necessary to be able to express the preference concisely via a criterion. To help the users 
that want to avoid these inconsistencies, we propose to exploit the information about the package source, 
which is present in the metadata of mainstream distributions. 

In Debian, for example, packages which are built from the same source package carry in their meta- 
data two pieces of information: 

• a source property,that specifies the name of the source package; for example, both packages related 
to the Linux kernel in the example above have a source property with the same value linux-2 . 6) 

• the version of the source used to build them; this information is encoded in the CUDF documents 
coming from Debian distributions in a sourceversion property; for example, the two packages 
related to the Linux kernel in the example above are built from two different versions, 2.6. 32-9 
and 2.6. 30-8squeezel, of the same source linux-2 . 6. 

Using this information, one can define what it means for an installation to be aligned. 

Definition 1 (Alignment) An installation I is source aligned if all installed packages built from a same 
source s are actually built from the same version of this source. 

In other terms, / is aligned if all packages pi having the same value for the source property also have 
the same value of the sourceversion property. 

We remark here that the version of a package, and the version of the source from which they are 
built do not necessarily coincide, and packages built from the same version of the same source may carry 
different package versions, so that using the version of the source as an alignment criterion is the best 
way of knowing whether a set of packages is aligned, without the need to guess similarity of packages 
by inspecting their package versions. 



^ See http: //data, mancoosi . org/misc2010/results/problems/debian-dudf /eeee44ce-5407-lldf-bllf-00:.63e7a6f 5e 
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Notations In the following, we write =5^ for the set of sources of the problem to solve. 

We note {pi}i=i..n the set of all available packages. For simplifying the notation, pi will also denote 
a 0-1 variable that expresses that package pi is installed; when the context is not enough to resolve the 
ambiguity we write package pi or variable pj. 

The relationships between the sources, the packages and their versions will be expressed with the 
following functions: 

• V{s) denotes the set of versions of source s £ y; 

• V{p) denotes the version of the source of package p; 

• P{s,v) denotes the set of packages belonging to version v of source s; 

• S{p) denotes the value of the source property of package p. 
For example, p G P{s,v) 4^V{p) =v and S{p) = s. 



3 Measuring unalignment 

In order to choose among different possible installations, we need to be able to measure how far we are 
from an aligned solution; for this, we need a measure of unalignment of a solution to a user query, that 
can be then used as an objective function to minimize. 

It turns out that there are quite a few different ways of defining such a notion, with varying cost 
and expressiveness. We discuss them in the following sections, where we present the different possible 
definitions. An encoding for MIP solvers, along the lines of lUl, is given in detail in Section [4] 



3.1 Counting unaligned packages 

A first approach to building a measure of unalignment is to count the number of packages pi which are 
installed and not source aligned. This can be expressed formally as the cardinality of a set: 

unalignedp = card{pi\i £ [l..n],pi = l,3jpj = i,S{pj) = S{pi),V{pj) ^ V{pi)} 

Note that in our notation, variables pi and pj are equal to 1 mean that packages pi and pj are installed. 
The above set contains all packages that are installed and such that another package with the same source 
in another version is also installed. 

To obtain an installation that is as aligned as possible, it is then enough to minimize unalignedp, the 
cardinality of the set. 



3.2 Counting (sorted) unaligned package pairs 

A second approach is to count the number of pairs of packages {pi,pj) which are both installed and not 
aligned. This can be done by computing the cardinality of a slightly different set: 

unaligned pp = card {{pi,pj)\i,j G [l..n],i < j,pi = l,pj = l,S{pj) =S{pi),V{pj) / V{pi)} 
The interest of this approach is to be much more discriminating than the unalignedp criteria (see 



Section 3.5 I. Nevertheless, a drawback may be that, as it implicitly weights a cluster up to the square 
of its size, a small qualitative improvement of a large and very unaligned cluster may strongly dominate 
clear qualitative improvements of some other smaller or almost aligned clusters. 
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3.3 Counting version changes 

In this third approach, the size of the cluster is not as important as in the unalignedpp criteria: it counts 
the number of version changes in a cluster. For example, consider a cluster with six installed packages 
that involve three different source versions: there will be two version changes. Formally: 

unalignedvc = ^ max{0,numberOfVersions{s) — 1) 

where: 

numberOfVersions{s) = card{V {pi)\i G [1. .«],/?; = l,S{pi) = s} 

Note that numberO /Versions (s) is the number of installed versions of the source s; thus, when this 
number is greater than 0, we need to subtract 1 to get the number of version changes. 

3.4 Counting unaligned source clusters 

Finally, one can use a much coarser granularity, counting only the source clusters which are unaligned, 
independently of the number of pointwise unalignments among packages of the same cluster, by using 

unalignedc = card{s\s G G [l..n],3y G [1. .«],/?,• = \,pj = \ ,S{pj) = S{pi) = s,V{pj) ^ V{pi)} 

3.5 Discussion of the different alignment criteria 

The different alignment criteria differ by their weighting policies. The number of unaligned source 
clusters unalignedc and the number of unaligned packages unalignedp are very close, except that the 
criterion unalignedc does not take into account the size of the clusters, whereas the criterion unalignedp 
weights a cluster by its size (each time a cluster of size k is unaligned, k packages are unaligned). The 
criterion unalignedpp is more discriminating by weighting a cluster by its pairwise unalignment, which 
may be really interesting, but it makes the implicit assumption that packages of a cluster are totally 
interdependant. When this assumption is too strong and the size of the cluster is large, the weight 
of a ^-sized cluster can be as large as k^, and alignments in large clusters may dominates too strongly 
alignments in small clusters. The criterion unaligned^c, based on version changes, provides an interesting 
intermediate solution: the weight of the cluster is the number of different versions in that cluster. 

To see in practice what each of the above criterion actually captures, it is useful to compare the results 
on a simple example. Let's consider a cluster c = {pi,p2,P3,P4} comprising 4 packages of the same 
source, with package versions among 1,2,3,4, and a few possible unaligned configurations. 
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4 Efficiently encoding the criteria using MIP 

This section describes an integer programming encoding of the unaligned criteria presented above. It is 
particularly efficient in practice with a MEP solver. Note that a clausal form of these criteria can also be 
obtained for using a SAT solver (see the Appendix). 

As a first step, the problem is reduced to the subset of sources with more than one source version. 
4.1 packages 

The number of unaUgned packages is computed using the following formulae 

^^packages nUpj 
Pj€P{s,v),veV{s),seS' 

where nup^ is a binary variable whose value is one if pj is installed and not aligned and zero otherwise. 
Each nUpj is handled by the following set of constraints 

nUpj <pj 

which forces nUpj to if package pj is not installed, and 

nUpj < 

vev{S{pj)),v^v{pj) 

where s = S{pj) and ig^v is binary variable whose value is 1 if any package of version v from source 
s is installed and zero otherwise. Therefore, the previous constraint forces nupj to zero if none of the 
other versions of source s, different from V{pj), has an installed package. nUpj is also involved in the 
following set of constraints 

Vv G V{S{pj)), V ^ V{pj), nup. + 1 > Pj + is,v 

which ensures that if pj is installed and one of the versions of s different from the source version of pj 
has an installed package, then nup. is set to one. 

Finally, constraints are added to handle the ?j,v variables. The first constraint ensures that is^v gets the 
value zero if none of the packages of version v from source s is installed 

is,v < Pj 

Pj€P{s,v) 

The second set of constraints sets is,v to 1 whenever at least one of the packages of version v from source 
s is installed 

Vp; GP(5,v), Pj < is,v 
Note that variables ig^y are also used in the encoding of the two last unaligned criteria. 
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4.2 pairs 

The number of unaligned pairs 



Pj-eP(i,v),veV(i),ie^,PteP(4-,v').v'eV(i),v'5^v 



where each Up.^p^ is subject to the three following constraints: 

Up^,p, < Pi A Up^^p, <Pk A Up^^p, + \>Pj+Pk 

The two first constraints insure that Up.^p^. = if either pj or p^ is not installed. Last constraint sets Up^^p^. 
iff both Pj and p^ are installed. 

4.3 version changes 

The number nuyc of version changes is given by the following formulae: 

nuyc = ncs 



where each nCs is subject to 
where each 5s is subject to 



nCs = nb, 



mst.s 



\V{s)\ *5s> nbinst,s A nbinst,s > 5s 



The first constraint sets 5s to 1 iff nbinst,s > 1, and the second one sets 5s to iff nbinst,s = 0. The nbi„st,s 
variable simply sum up the number of installed source versions (i.e., the number of source versions with 
at least one installed package). Thus, 

vey(j) 

4.4 clusters 

The number of unaligned clusters of source is given by the following formulae: 
where each Us is subject to 

\V{s)\*Us + l> nbi„st,s A nbinst^s >2*Us 

The first constraint sets Us to 1 iff nbi„st,s > 2, while the second one forces Us to iff nbinst,s < 1- nbinst,s 
has the same definition as in the unaligned version changes. 
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Figure 1: Running time (s) and number of unalignments on the MISC-2010 Debian problem instances 



5 Experimental validation 

We implemented the four alignment criteria introduced above in an experimental branch of the mccs 
tooj^ which uses MIP instead of the Boolean encodings, and includes several optimizations with respect 
to the simple encodings detailed above. 

We have run the solver on the Debian category of the problems of the MISC-2010 competition 
and of the 4th run of the Misc Live competition with a realistic optimization function that requires, in 
lexicographic order to first minimize removal, and then minimize unalignment, using each of the four 
different criteria for unalignment. The result^of running this experiments on an Intel Core I7-2720QM 
at 2.20GHz are given in Figure [T] for the MISC-2010 competition and in Figure |2] for the 4th run of 
the Misc Live competition. In these tables, the size column gives respectively, the number of sources 
(with more than one version) of the problem, the total amount of versions, the total amount of packages 
(corresponding to the selected sources/versions), and, the number of unique pairs. The removed column 
gives the time (in seconds) required to optimise the problem according to the sole removed criterion, 
as well as, in brackets, the number of unaligned packages, pairs, version changes and clusters of the 
solution. Last four columns give the amount of time required to solve the problem minimizing removal 
and the chosen unahgnment, as well as, in brackets, the number of unalignments. Note that, for the sake 
of fairness, CPLEX, the underlying MIP solver, has been limited to one thread. 

These two sets of results show a strong relationship between the structure of the problem, the chosen 
unalignment measure and the time required to solve the problem. However, these results seems to in- 
dicate that the version change alignment criterion offers a good trade off between discriminating power 
and running time. 



: //users .polytech. tmice . f r/~cpjm/misc/mccs .html 
^Though the two sets of Debian problems share some problem IDs, there are different problems as testified by the problem 
sizes and the different times. 
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8.52 (0) 


d0cc7514 


1400,6079,6086,24445 


2.79 (337,682,68,57) 


6.96 (0) 


14.69 (0) 


4.11 (0) 


4.42 (0) 


dl583bd8 


1130,3886,3888,7918 


2.42(134,111,44,43) 


3.98 (0) 


6.67 (0) 


3.58 (0) 


2.85 (0) 


dd08e73e 


1130,3886,3888,7918 


2.40(134,111,44,43) 


4.02 (0) 


6.73 (0) 


3.61 (0) 


2.92 (0) 


e69a0e36 


1426,5889,5891,20929 


2.92 (274,265,69,60) 


5.85 (0) 


12.44 (0) 


4.44 (0) 


4.13(0) 


e8a3eb4c 


3795,12207,12207,15450 


4.44 (0,0,0,0) 


8.94 (0) 


8.70 (0) 


10.20 (0) 


8.43 (0) 


ff4ald84 


1224,4457,4459,10304 


1.96 (217,262,69,67) 


4.13(0) 


6.83 (0) 


3.46 (0) 


3.05 (0) 




Total time 


81.00 


147.84 


222.57 


134.89 


119.20 



Figure 2: Running time (s) and number of unalignment on the Misc Live (4th run) Debian problem 
instances 



6 Discussion 

Aligning components in a software installation is an important issue; we have shown that it is possible 
to capture this property in several ways, according to the discriminating power one looks for, and that a 
state of the art MIP solver such as CPLEX has a running time on realistic use cases that is acceptable. 

An important question is whether a similar performance can be attained using different solving ap- 
proaches, Uke PBO, MaxSat or Answer Set Programming, which are present in the MISC competition. 
We propose that the different measures of unalignment introduced here be incorporated in future MISC 
competitions, and that component installers offer them to the users. 

For future work, it would be interesting to allow the users to fine-tune the subset of source packages 
on which the ahgnment is required, by introducing a more general criterion unaligned(clusters: vl,...,vn), 

that evaluates unalignment only on the clusters for vl, vn: this does not present significant technical 
difficulties and can be done by generating the constraints only for the specified source clusters. 

Alignment being only a restricted definition of a more general synchronization criterion, it may be 
equally important to synchronize some packages that are not built from the same sources, but are closely 
related. Such synchronization relations between packages could be expressed by extending metadata. 
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Aligning component upgrades 
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A Encoding unalignment for SAT 

It is possible to write natural encodings of the different criteria for SAT; we present here the ones for the 
packages and package pairs criteria. 

packages The definition can be encoded as 

unalignedp <?=^ 7? A \J qt 

\S(q,)=S(p)^V(q,)^V{qp) 

For minimizing unalignment, it is enough to use the clauses coming from the dominance relation 



unalignedp <^ p A \l <Ji\ = unalignedp <^ \l p /\qi 

\s{q,)=S{p]AViq,)^Viqp) J \s{qi)=S{p]AV{qi)^V{qp) 

= /\ {unalignedp ^ p Aqi) 

S{q,)=S{p)AV{q,)^V(qp) 

= f\ ->unalignedp\/ ->p\/ ->qi (3) 

S{q,)=S{p)AV(q,)=iV(qp) 

pairs For each package pair (/7,-,/7y^ which is not aligned, build a literal unaligned p., pj which is true 
iff both Pi and pj are installed. 

unaligned p. ^p. <?=^ pi A pj 
For minimizing unalignment, it is enough to use the clauses coming from the dominance relation 

unaligned p. ^p J pi A pj = 

= -^piM ^pjM unalignedp.^pj (3) 



^Take p; < pj to avoid counting thie pairs twice. 



